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(57) Abstract 

A CPU or microprocessor which 
includes a genera] purpose CPU compo- 
nent, such as an X86 core, and also in- 
cludes a DSP core. The CPU also in- 
cludes an intelligent DSP function de- 
coder or preprocessor which examines 
X86 opcode sequences and determines if 
a DSP function is being executed. If the 
DSP function decoder determines that a 
DSP function is being executed, the DSP 
function decoder converts or maps the 
opcodes to a DSP macro instruction that 
is provided to the DSP core. The DSP 
core executes one or more DSP instruc- 
tions to implement the desired DSP func- 
tion in response to the macro instruction. 
The DSP core implements or performs 
the DSP function using a lesser number 
of instructions and also in reduced num- 
ber of clock'cycles; mus~iricreasing sys- ~ -~. ----- - - ',.**" 

tern performance. If the X86 opcodes tn.?-^ 

the instruction cachejar instruction mem- " . 

ory do not indicate or are not intended to ~ ■ . v , 

perform a DSP-type function, the opcodes are provided, to the X86 core as which occurs in curjrent prior art computer systems. The X86 
core and the DSP core are coupled to each other and communicate data and timing, signals for synchronization purposes. Thus, the DSP 
core offloads these mathematical functions from the X86 core, thereby increasing system performance. Xhe DSP. core also -operates in 
parallel with the X86 corefproyiojjig further performance benefits. The CPU of the present invention thus implements DSP functions more 
efficiently than X86 logic while requiring no additional X86 opcodes. The present invention also generates code that operates transparently 
on an X86 only- CPU or a CPU-according to the present invention which includes X86 and DSP cores. Thus the present invention is 
backwards compatible with existing software. 
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Title: Central Processing Unit Having an X86 and DSP Core and Including a DSP Function Decoder which 
Maps X86 Instructions to DSP Instructions 

Field of the Invention 

5 The present invention relates to a computer system CPU or microprocessor which includes a general 

purpose core and a DSP core, wherein the CPU includes a DSP function decoder which detects general purpose 
opcode sequences intended to perform DSP-type functions and converts these opcodes into corresponding DSP 
.macros for execution by the DSP core. 

10 Description ofthe Related Art - 

Personal computer systems and general purpose microprocessors were originally developed for 
business applications such as word processing and spreadsheets, among others. However, computer systems are 
currently being used to handle a number of real time DSP-related applications, including multimedia applications 
having video and audio components, video capture and playback, telephony applications, speech recognition and 
15 synthesis, and communication applications, among others. These real time or DSP-like applications typically 
require increased CPU floating point performance. 

One problem that has arisen is that general purpose microprocessors originally designed for business 
applications are not well suited for the real-time requirements and mathematical computation requirements of 
modem DSP-related applications, such as multimedia applications and communications applications. For 
20 example, the X86 family of microprocessors from Intel Corporation are oriented toward integer-based 
calculations and memory management operations and do not perform DSP-type functions very well. 

As personal computer systems have evolved toward more real-time and multimedia capable systems, 
the general purpose CPU has been correspondingly required to perform more mathematically intensive DSP-type 
functions. Therefore, many computer systems now uicliide.one or.inore digital signal processors whichare 
25 dedicated towards these complex mathematical functions v 

A recent trend in computer system architectures is the movement toward "native signal processing 
(NSP)". Natiye ; signal processing or NSP was originally introduced by Intel Coronation as a sowegy to offload 
certain functions from DSPs and perform these functions within thejmain or general purpose CPU; The strategy 
presumes that, as performance and'cloclr speeds of general purpose CPUs increase, the general-purpose CPU is 
able to perform many of thcfunctions formerly performed by dedicated DSPs. Thus, one trend in the 
microprocessor industry is an effort to provide CPU designs with higher speeds and augmented with DSP-type 
capabilMes, such as more.pow Another trend in the industry is for DSP manufacturers 

td provide DSPs that not onry run at high speeds but also can emulate CPU-type capabilities such as memory 
'management functions. 

35 A digital signal processor is essentially a general purpose microprocessor which, includes special 

^ hardware- for executing mathematical ; functions at speeds and; efficiencies not usually associated with 
microprocessors. In current, computer system architectures, DSPs are used as co-processors : ah* operate in 
conjunction with general purpose CPUs within the system. For example, current computer systems may include 
a general purpose CPU as the mam CPU and include one or mow multimedia or communication expansion cards 
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which include dedicated DSPs. The CPU offloads mathematical functions to the digital signal processor, thus 
increasing system efficiency. 

Di * ital signal processors include execution units that comprise one pr more arithmetic logic units 

(ALUs) coupled to hardware multipliers which implement complex mathematical algorithms in a pipelined 
manner. The ^ mstruction set primarily comprises DSP-type instructions and also includes a small number of 
instructions having non-DSP functionality. 

The DSP is typically optimized for mathematical algorithms such as correlation, convolution, finite 
™J> uIsc response (FIR) filters, infinite impulse response (IIR) filters, Fast Fourier Transforms (FFTs), matrix 
computations, and inner products, among other operations. Implementations of these mathematical algorithms 
generally comprise long sequences of systematic ari^etic/multiplicative operations, These operations are 
interrupted on various occasions by decision-type commands. In general, the DSP sequences are a repetition of 
a very small set of instructions that are executed 7Q% to 9<>% of the time. The remaining 10% to 30 % c f the 
instructions are primarily Boolean/decision opcradons (or general data proc^ 

A general purpose CPU is comprised of an- execution unit, a.memory management unit, and a floating 
point unit, as well as other logic. The task,pf a general purpose CPU is to execute code and perform operations 
on data in the computer memory and thus ^manage the computing platform. In general, the general purpose 
CPU architecture is designed primarily to perform Boolean / management / data manipulation decision 
operations. The instructions or opcodes executed by a general-purpose ;CPU include basic mathematical 

^? iOI *7 .. Ho ^ e y cr ^ th f se !^^ e !T? a ^ < : ) l^ J^ 1 ?! ^ nc *5 , u^ 13 ^ 1 ^ to. complex DSP-type mathematical 
operations. Thus a general purpose CPU is requ^to execute a large number : of opcodes or instructions to 
perform basic DSP functions. ,', v ^ 

. . ..T^^?** a computer jystein and CPy architecture is desired which includes a general purpose CPU 
and which a^ r^onns DSP-type mamemaucal fmcrions^ with increased ^performance. A CPU architecture is 
also desired which is backwards compatible with existing software applications which presume that the general 
P^F°^ C : CP V * P^fp™" 1 ? of mamematical computations. A new-, CPU architecttire is further desired 
which provides mcreased mathematical pert bimjuice : for existing software applications. 

One popular miOTprocessor used in perspnal computer systems is the, X86 family of microprocessors. 
The X86 family of microproces^ .the 8088, 8086, 80186, 80286, 80386, i486, Pentium, and P6 

microprocessors from Intel Corpp^ family of microprocessors also includes X86 compatible 

processors such as the 4486 and KS proc^rs from Advanced Micro Devices, the : M1 processor from Cyrix 
Corporation, and the NextGen 5x86 and 6x86 processors from NextGen Corporation. The X86 family of 
micror^oce^ors wa? primarily designed and developed for business applications. In general, the instruction set 
of the X86 feroily of microprocessors does not include sufficient rnathematical or DSP functionality for modem 
multimedia and comrounrati^ Therefore, a new X86CPU architecture is further desired which 

implements DSP functions more efficiently than current X86 processors, but also requires no additional opcodes 
for the X86 processor. 
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Summary of the Invention 

The present invention comprises a CPU or microprocessor which includes a general purpose CPU 
component, such as an X86 core, ami also includes a DSP core The CPU includes an intelligent DSP function 
decoder or preprocessor which examines sequences of instructions or opcodes (X86 opcodes) and determines if 
5 a DSP function is being executed. If the DSP faction' decoder determines that a DSP function is being 
executed, the DSP function decoder converts or maps the instruction sequence to a DSP macro instruction or 
function identifier that is provided to the DSP core! the DSP core executes one or more DSP instructions to 
implement the desired DSP function [ indicated by the ti&P macro or function identifier. The DSP core performs 
the DSP function iii parallel with'other operations performed by the general purpose" CPU core. The DSP core 
10 also performs the DSP function using a lesser number of instructions and also in a reduced number of clock 
• cycles, thus increasing system p^onnanee. ' : 
' V ' In the preferred embodiment, tte CPU oV the present invention includes an instruction memory or 
instruction cache Which receives micrbpraiessor instructions or opcodes from the system memory and stores 
these opcodes** Wby thVCPU. ^ function decoder or preprocessor, also 
15 referred to as an instruction sequence pre^rot^ssor, which analyzes insmiction sequences in the instruction cache 
and intelligently determines^wten & DSP-tyjfe function^ implemented ty or represented by the instruction 
sequence. The function preprocessor s^kns ahead for instruction sequences In the instruction calhe that 
implement DSP functions. ~ c v ' i a ^ - v - ?:^o.^ > v <• r - ir: - - 

In one embbdimeni; Ihe 1 WclSoh p pattern recognition detector which stores a 

plurality of bit patterns Vdicative of^^ii^n ^uenceT'w^ich Wplemeiit DSP fiictions. The pattern 
recognition detector compares each pattern with an instruction sequetice and determmes if one of the patterns 
- substantially matches the msttuctioh SequeW In one'embod^CTU a £&z&ii 'iratch occurs when a pattern 
:w ' r **--in^che^tRi instruction sequence by' ^greater than 9&%. In^anome/eiHDo^ preprocessor 
J^TnffidesWliM^ table which Wores a plurality of bit patted eh^ sequences which 

-25 - ^implement DSFfunctibns: 'The function prejSrtt^^ paton ran^wiih an instruction sequence 

and determiriesTtf one of mtfciitnwe^^ ^uenci! U Other ^bodiments include a two 

; i ? «age detemMnatidh of a looknip table airicl a pattern recogn ition detector. 1 ^ ' * 

In the preferred embodiment, the fuhctidn 5 j>rep^ X86 miction sequences which are 

intended ^perform DSP-type functions suchas cbnvolirabo, cWlktion?Fak'*^ (FFTs), finite 

impulse response (FIR) filters, infinite impulse res^nsc\lIR) filter^ inner products and matrix manipulation 
operations. ' "* • **- rc * ~ ' " ' ' } ' ' ' K - ^ J 

! If the instructions in the mstruction cache or instruction memory do not implement a DSP-type function, 
the insertions are provided to the general purpose or X86 core'/ or to one or more XS6 execution units, as 
■ >vhich occurs^ in current prior art computer system!' Thus the X86 core executes general purpose X86 
35 I - instructions which do not represent DSP functions;" *' : - T ' ; 

When the function preprocessor detects a sequence of X86 insmictibiis which implement a DSP 
function, i.e., are intended to perform a DSP-type function, the function preprocessor decodes the sequence of 
X86 instructions and generates a single macro or function identifier which represents the function indicated by 
the sequence of X86 instructions. The function preprocessor also examines information in the X86 instruction 
40 sequence and generates zero or more parameters which indicate the data values being used for the DSP-type~" 
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operation. The function preprocessor then provides the function identifier and the various necessary parameters 
to the DSP core, or to one or more DSP execution units. 

The DSP core receives the macro or function identifier and the respective parameters and uses the 
macro to index into a DSP microcode sequence which implements the indicated DSP function. The DSP core 
also uses the respective parameters in executing the DSP function. Since the DSP core is optimized for these 
DSP-type mathematical operations, the DSP core can generally execute the desired function in a reduced number 
of instructions and clock cycles. 

The E>SP core ^executes in parallel with the general purpose CPU core. Thus X86 (non-DSP) opcodes 
are potentially executed by the general purpose CPU core or X86 core in paralleLwith DSP functions, assuming 
there is data independence. The general purpose core and the DSP core are coupled to each other and 
communicate data and timing sigrials for s^chronization purposes. In one embodiment, a cache or buffer is 
comprised between the general purpose core and the DSP core for the transfer of ^information between the two 
units.' ' 

Thus, the general purpose CPU portion executes X86 instructions as in,prior systems. However, for 
those instruction sequences r which re^httended ta'j^fy^' DSP^pt ^ functions, the function preprocessor 
intelligently detects these sequences and provides a corresponding macro and parameters to the DSP core. Thus, 
the DSP core offloads these mathematical functions from the general purpose core, thereby increasing system 
performance. The t>SP core also o^e^ core, providing further 

performance benefits. • - v 

Therefore the present invention comprises a general purpose CPU including a DSP core which 
performs DSP operations. The CPU includes an intelligent DSP function decoder or preprocessor which 
examines instruction sequences and converts or maps sequences which perform DSP functions to a DSP macro 
instruction for execution by the DSP core. The DSP core uses the DSP macro instruction to implement the 
desired DSP function. The DSP core implements or performs the DSP function in a lesser number of 
instructions and also in a reduced number of clock cycles, thus increasing system performance. The CPU of the 
present invention thus implements DSP functions more efficiently than X86 logic while requiring no additional 
X86 opcodes. The CPU of the present invention also executes code that operates on an X86-only CPU. thus 
providing backwards compatibility with existing software. Further, code written for the CPU of the present 
invention also operates properly on an X86-only CPU. 
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Brief Description of the Drawings 

A better understanding of the present invention can be obtained when the following detailed description 
of the preferred embodiment is considered in conjunction with the following drawings, in which: 

Figure 1 is a block diagram of a computer system including a CPU having a general purpose CPU core 
and a DSP core according to the present invention 

Figure 2 is a block diagram of the CPU of Figure 1 including a general purpose CPU core and a DSP 
core and including a DSP function preprocessor according to the present invention; 

Figure 3 is a flowchart diagram illustrating operation of the present invention; 

Figure 4 is a more detailed block diagram of the CPU of Figure 1; 

Figure 5 is a block diagram of the Instruction Decode Unit of Figure 4 ; 

Figure <S is a block diagram of the function preprocessor according to one embodiment of the invention; 
Figure 7 is a block diagram of the function preprocessor including a pattern recognition detector 
according to one embodiment of the invention; 

Figure 8 illustrates' operation of the pattern recognition detector of Figure 7; 

Figure 9 is a block diagram *oif the function preprocessor including a look-up table according to one 

embodiment of the mvenfion; " * "~ ' *** 1 

Figure 10 illustrates operation of the look-up table of Figure 9; jand 

Cl - - T * / • ' ''till'", .-a* ' ul .V;!' f ~.c'l£ , m ~~' > . L.-ii \'\ ' • -Ji 

Figure 11 is a block diagram of the function preprocessor including a pattern recognition detector and a 
look-up table according to one embodiment of the invention. 
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Detailed Description of the Preferred Embodiment 

Computer System Block Dfr mn , , 

Referring now to Figure 1, a block diagram of a computer system Jncorpprating a central processing 
unit (CPU) or microprocessor 1 02 according to the present invention is shown. The computer system shown in 
Figure 1 is illustrative only, and the CPU 102 of the present invention may be incprponited into any of various 
types of computer systems. 

As shown, the CPU 102 includes a general purpose CPU core 212 and a DSP core 214. The general 
purpose core 212 executes general purpose (non-DSP) opcodes and. the DSP core 214 executes DSP-type 
functions, as described further below. In the preferred embodiment, the general purpose CPU core 212 is an 
X86 core, i.e.. is compatible with the X86 family of microprocessors. However, the general purpose CPU core 
212 may be any of various types of CPUs, including the PowerPC family, the DEC Alpha, and the SunSparc 
family of processors, among others. In the following disclosure. th c general purpose CPU core 212 is referred to 
as an X86 core for convenience. The general purpose core 212 may comprise one or more general purpose 
execution unite, and the DSP core 2 14 may cpmprise one or more digital signal processing execution unite. 

As shown, the CPU 102 is coupled through a CPU toeal bus 1 04 to a host/PCI/cache bridge or chipset 
106. The chipset 106 is preferably similar to t£ Triton chipset available fhwj ^Conpoi^km. A second level 
or L2 cache memory (not shown) may be cqupled"to a cache controller in the chipset! as desired. Also, for some 
processors the external cache may be an LI or first level .cache, The bridge or chipset 106 couples through a 
memory bus 108> main V^J^^f^Jfg^ I JO^pre^rably .DliAM (dynamic random access 
memory) or EDO (extended data out) memory, oro'*er,tvr^spf memory, as desired. 

The chipset 106 includes various peripheral system! a real time clock (RTQ 

and ^.•^ () !>^^.<D^V^ .^P^iashmemory^cornrnuni^ ports, diagnostics 

?nd ?°^ la V^ ?^ ic ^ om memory ^YSRAM) (all not shown). 

, . r.: . ?! ^? St W c ^ h ? K^f. }<* inter^es^ a peripheral component .interconnect (PCI) bus 

120. In the preferred embodiment, a PCI local bus is used. However, it is noted that other local buses may be 

.? S ^ h £* e ^^ id ^ E ! e ?*° ni « Standards Association) VL bus. Various types of devices may be 
connected to the PCI bus 120. In *e emb<riim^ , 70 

and a network interface controller 140 are coupled to^the PCI bus 120., The video adapter connects to a video 
inonkor ITO, and the network interface controller 140 couples to ajocal area network (LAN). A SCSI (small 
computer systems interface) adapter 122 may also be coupled to the PCI bus 120. as shown. . The SCSI adapter 
122 may couple to various SCSJ devices 124. such as a CDrROM. drive and a tape drive, as desired. Various 
other devices may be connected to the PCI bus. 120, as is well known in the art. u 

Expansion bus bridge logic 150 may also to coupled to the PCI bus 120.. The expansion bus bridge 
logic 150 uuerfaces to an expansion bus 152. .The expansion bus 152 may be any of varying types, including the 
mdustry standard architecture (ISA) bus, also referred to as the AT bus, the extended induary standard 
architecture. (EISA) bus, pr the MicroChannel ^architecture (MCA) bus., Various devices may be coupled to the 
expansion bus 1 52, such as expansion bus memory 1 54 and a modem 156. 
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CPU Block Diagram 

Referring now to Figure 2, a high level block diagram illustrating certain components in the CPU 102 
of Figure 1 is shown. As shown, the CPU 102 includes an instruction cache or instruction memory 202 which 
receives instructions or opcodes from the system memory 11 6. Function preprocessor 204 is coupled to the 
instruction memory 202 and exammes'uistruction sequences or opcode sequences in the instruction memory 202. 
The function preprocessor 204 is also coupled to the X86 core 212 and the DSP core 214. As shown, the 
function preprocessor 204 provides instructions or opcodes to the X86 core 212 and also provides information to 
the DSP core 214. ' v ^ ? . . 

The X86 core 212 and DSP core 214 are coupled together and provide data and timing signals between 
each other. In one embodiment, the dPU 102 includes one or more buffers (not shown) which interface between 
w the X86 core 212 and the DSP core 2 14 to facilitate transmission of data between the X86 core 212 and the DSP 
core 214. * **'' v ' '* 

Figure 3 -Flowcha rt ' " " " ' !C * ! - c 

15 Refemng now to Figure 3, a . fiowchart diagram illustrating operation of the present in vention is shown. 

It is noted that two or more of the steps in Figure 3 may operate concurrently, and the operation of the invention 
is shown in flowchart form" for convenience. As shown, in step 302 the instruction memory 202 receives and 
stores a plurality of X86 instructions'" The plurality of X86 instructions may include one or! more instruction 
sequences which implement a'fesP ft^ction. in step n 304 me^kmction preprocessor 204 analyzes the opcodes, 
20 i.e., an instruction- se^uence^in Ae' r irisWctlon memor^^Oi and in step 306 intelligently determines if the 
sequence of r insnTictio^s ar¥ oesy^d 1 ^ determines if the 

■ ' • f * iitsWctrolii^uence ihlpFements a tfsP-rype fraction. In the present disclosure, a DSP-type function comprises 
one or more of the tblldwirig mathematical functions: correlation, convolution, Fast Fourier Transform, finite 



"b .2 "OU 



tmpiiisVre^nse ^ filter', 4 infinite inipulse response manipulation, among others. 

< r 25 c TheoperatioH or'the function pi^iocessor 204 is o^cnbedlnore fully indie description associated with Figure 

v v lf mc instructions bf 'opcodes stored in me'msb^ctioh c^he 202 do not correspond to a DSP-type 
- function, the Instructions are prbVified to^&e Xtfc corel* 1 2 in step 308.* Thus, these insertions or opcodes are 

provided directfy^from the instnictibri cached to me X86 core 212* for execution,' as occurs in prior art X86 
30 compatible CPUs. After the oi>c^& : are transferred to ihe X86 core 212, m step 3 10 the X86 core 212 executes 

the instructions. * — ^ " 

• ■ If the function preprocessor 204 detects a sequence or* instructions which correspond to or implement a 
DSP-type function in step 306, then in "step 312 the ftincrion preprocessor 204 analyzes the sequence of 
instructions and determines the respective DSFMype function being implemented. ~ In step 312 the function 
35 preprocessor 204 mapslhe sequence ariiistructions to a res^ctive £)SP macro identifier/also referred to as a 
i- function identifier The function preprocessor 2tU also analyzes" the mfomadon in'the sequence of opcodes in 
step 312 arid generaies iero or more parameters for use by the tfeP core or accelerator 214 in executing the 
function identifier. As shown, m step 3 14 the function preprocessor iok provides the function identifier and the 
parameters to the DSP core 2 14. 
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The DSP core 214 receives the function identifier and the associated parameters from the function 
preprocessor 204 and in step 316 performs the respective DSP function. In the preferred embodiment, the DSP 
core 214 uses the function identifier to index into a DSP microcode RAM or ROM to execute a sequence of DSP 
instructions or opcodes. The DSP instructions cause the DSP to perform the desired DSP-type function. The 
DSP core 2 1 4 also uses' the respective parameters in executing the DSP function. 

As mentioned above, the X86 core 212 and DSP core 214 are coupled together and provide data and 
timing signals between each other. In the preferred emb^iment, the X86 core 212 and DSP core 214 operate 
substantially in parallel. Thus, while the. X86 core 212 is executing one sequence of opcodes, the DSP 
accelerator 214 may be executing one or more DSP functions corresponding to another sequence of opcodes. 
Thus, the DSP core 214 does not operate as a slave or co-processor, but rather operates as an independent 
execution unit or pipeline. Tne DSP core 214 and the X86 core 212 provide data and timing signals to each 
other to indicate the status of operations and also to provide any data outputs produced, as well as to ensure data 
coherency/ independence. " ■•<■'. 

Example Operation 

The following describes an example of how a string or sequence of X86 opcodes are converted into a 
function identifier and then executed by the DSP core or accelerator 214 according to the present invention. The 
following describes an X86 opcode sequence whidh performs, a simple inner product computation, wherein the 
inner product is averaged over a vector comprising 20 valuesF - 

'"'"' r " * ' " X86 Coder . 

■, . ,,„ : , - :. f Simple inner product 1 . ., . , ; ' 

I MovECX,nuin_samples; {Set up parameters for macro } " 

1 MovESI, addressj; \~ 

1 Mov EDI, address" 2; , ,. ... . 

I MovEAX, 0; {Initialize vector indices) 

1 MovEBX,0; 
.4, •. , FLd?;: :. •, -■:-:.•< {Initialize sum of products } ~ ! :;1 

. Again:. t _ . .. , .. h 

1 (Update counter} 

4:-: nddwordptr[ESI+EAX»4]; . - {Get vector efemerits and) 

I . IncEAX; _ , t {update indices}- ■ » V. 

4 ~ Fid dword ptr [ED1+EBX*4J; 
1 IncEBX; 

13 FMulP St(I), St; {Compute product term} 

7 FAddP St(l), St; {Add term to sum) 

1 LOOP Again; {Continue if more terms) ~- 

As shown, the X86 opcode instructions for a simple inner product comprised a plurality of move 
instructions followed ^WWoad'iunctioii wherein this sequence is repeated a plurality of, times. If this X86 
opcode* sequence were executed by the X86 core 212. the execution tune for this, inner product computation 
wtould require 709 cycies (9* + 20 X 35). This assumes i486 timing, concurrent execution of floating point 
operations, and cache hits for all" instructions ana data required for the inner product, computation. The function 
preprocessor 204 analyzes the sequence of opcodes and detects that .me opcodes are performing an inner product 
computation. The function preprocessor 204 then converts this entire sequence of X86 opcodes into a single 
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macro or function identifier and one or more pai^imeters. An example macro or function identifier that is created 
based on the X86 opcode sequence shown above would be as follows: 
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1 * : Example Macro - . * 

...... (as it appe ars in assembler! 

. ? . lnner^product^simple ( ; . . - ' -*.. - : 
, address J, . ,{ Data vector} 
address_2; " {Data vector} 

■a. . ' .num_samples);_ * ^Length of vector} - -m 

This function identifier and one or more parameters are provided .to the DSP core 214. The DSP core 
214 uses the macro provided the function preprocessor 204 to load one or more DSP opcodes or 

instrurtions which execute the DSP ^ctipn. In the preferred embodiment the DSI^core 214 uses the macro to 
index into a ROM which contains the instructions used for executing the DSP function. In this example, the 
DSP code or instructions executed by the DSP core 214 in response to receiving the macro described above are 
shown below: 



Cntr 
porl j 
ptr2 
MAC 
regl 



mim_samples.; 
addressj; ! : 
address^?; v 
0; 

*P*I++, . _ 



reg2 = *ptr2-H-; : 
Do LOOP until ce; * - V" : ^*- 
MAC regl "reg2, 

regl = ^ptrl-Hf, 
reg2 = *ptr2++; 

LOOP: 



.. DSP Code 
(Simple i nner product) 

v ... . . , . (Set up parameters^from macro} 

it/".- . 

{Initialize sum of products} 
{Pre-load multiplier input registers} 

{Specify loop parameters} 
{Form sum of products} . 



{Continue if more terms} 



In this example, the, DSP core 214 performs this inner product averaged over a vector comprising 20 
values and consumes a total of 26 cycles (6 + 20 X 1). This assumes typical bsP timing, including a single 
cycle operation of instruction^ zero oyerhead looping arid cache hits for all instructions and data. Thus, the DSP 
core 214 provides a perfor^ce>iricW of over 28 times of Aat ; vvhere;ie X86core 212 executes this DSP 
function. * '. , 



40 



45 



Figure 4 - CP U Block Diagram , , t ! 

Referring now to Figure 4, a more detailed block diagram is shown, illustrating the internal components 

of the CPU 102 accordirig to the present invenuon. Elements. 'in i the CPU 102 jthat are not necessary for an 
' understanding of the present invention are n« describe^ for simplicity.', to shown, in the preferred embodiment 
^the CPU 102 includes a bus interface unit 440, mstruction cache 202, a data. cache 444, an instruction decode 

unit 402, * plurality bf execute units 448, a load/store unit 450, a reorder buffer 452, a register file 454, and a 

DSPunit2I4. < * * ■ . > 

As Wown; the CPU 102 includes a bi* interface unit 440 which includes! circuitry for performing 
tomniunication upon CPU bus 104. the bus* interface unit 440 interfaces to the data cache 444 and the 
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instruction cache 202. The insmiction cache 202 prefetches instructions from the system memory 110 and stores 
the instructions for use by the CPU 102. Tlie instruction decode unit 402 is coupled to the instruction cache 202 
and receives instructions from the insmiction cache 202. The instruction decode unit 402 includes function 
' preprocessor 204, as shown. The function preprocessor 204 in the instruction decode unit 402 is coupled to the 
instruction cache 202. The ihstiiiction decode unit 402 further includes an instruction alignment unit as well as 
other logic. 

Tlie instruction decode unit 462 couples to a plurality of execution units 448, reorder buffer 452, and 
load/store unit 450. The plurality of execute units are collectively referred to herein as execute units 448. 
Reorder buffer 452. execute unite 448, and load/store unit 450 are each coupled to a forwarding bus 458 for 
forwarding of execution results: Load/store unit 450 is coupled to data cache 444. DSP unit 214 is coupled 
directly to the instruction decode unit 402 through the DSP dispatch busW It is noted that one or more DSP 
units 214 may be coupled to the instruction decode unit 402. 

' Bus interface unit 440 is configured' to effect communication between microprocessor 102 and devices 
coupled to system bus' 104." For exkmple, instruction fetches which, miss instruction cache 202 are transferred 
from main memory 1 10 by bus fnterface unit 440. Similarly, data requests performed by load/store unit 450 
which miss data cache Ware transferred from rr*in memory 1 10 by bus inu^rface unit 440. Additionally, data 
cache 444 may discard a cache line of data Which has been modified by microprocessor 102. Bus interface unit 
440 transfers the modified line to main memory 110. 

Instruction cache 202 is preferably a high speed each? memory for ^ storing insttuctions. It is noted that 
instruction cache 202may be configured into a seiassoc^ive or direct mapped configuration. Instruction cache 
202 ii»y additionally include a branch prklc^-n^i^fol. predicting branch instructions as either taken or 
not taken. A "taken" branch instruction causes instruction fetch and execmion to continue at the target address 
* of the branch iiwructioa A "hot takeiT branch insertion causes instruction fetcli and execution to continue at 
the insmictioii subsequent to the branch instrucrion. instructions are fetched from instruction cache 202 and 
conveyed to uBtniaioh decode unit 402 for decode and dispatch to an executiqn unit The instruction cache 202 
' ""^ ^ 'j*^*' • 'prediction 'mechanism for predicting macro insttuctions and taking the appropriate 



'action.' 



V ! J f trUttU T <fcC ° de 402 deCOdeS ^ ction s received "from the instruction cache 202 and provides 
the decoded mstructions to the execute units 448, the load/store unit 450. ot tbe^DSP unit 214. The instruction 
30 decode unit 402 is preferably configured to dispatch an instruction to more than one execute unit 448. 

The instruction decode unit 402 includes function preprocessor 204. According to the present 
invention, the function preprocessor 204 in the instruction decode^ unit 402 is configured to detect X86 
instruction sequences in the instruction cache 202 which correspond, to.or perform DSP functions. If such an 
instruction sequence is detected, the function preprocessor 204 generates a corresponding macro and parameters 
and transmits the corresponding DSP macro and parameters to the. DSP Unit 214 upon DSP dispatch bus 456. 
The DSP unit 214 receives the DSP function macro and parameter .information from the instruction decode unit 
402 and performs the indicated DSP function. Additionally, DSP unit 214 is preferably configured to access 
data cache 444 for data operands. Data operands may be stored in a memory within, DSP unit 214 for quicker 
access, or may be accessed directly from data cache 444 when , needed. Function preprocessor 204 provides 
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feedback to instruction cache 202 to ensure that sufficient look ahead instructions are available for macro 
searching. 

If the X86 instructions in the instruction cache 202 are not intended to perform a DSP function,' the 
instruction decode unit 402 decodes the instructions fetched from instruction cache 202 and dispatches the 
instructions to execute units 448 and/or load/store unit 450. Instruction decode unit 402 also detects the register 
operands used by the instruction and requests these operands from reorder buffer 452 and register file 454. 
Execute units 448 execute the X86 instructions as is known in the art. 

Also, if the DSP 214 is not included in the CPU 102 or is disabled through software, instruction decode 
unit 402 dispatches all X86 instructions to execute : units 448. Execute units 448 execute the X86 instructions as 
in the prior art In this manner, if the DSP unit 214 is disabled, the X86 code, including the instructions which 
perform DSP functions, are executed by the X86 core, as is currently done in prior art X86 microprocessors. 
Thus, if the DSP unit 214 is disabled, me program executes correctly even though operation is less efficient than 
the execution of a coirespd'nding routine ; in the DSP 214. Advantageously, the enabling or disabling, or the 
presence or 'absence, of the DSP core 214 in the CPU 1 02 does not .affect the correct operation of the program. 

In one embodiment, execute units 448 are symmetrical .execution units ; that are each configured to 
execute the insmiction set employed by microprocessor " jjp. In another embodiment, execute units 448 are 
asymmetrical execution units configured' to e« subsets. For example, execute units 

448 mClUdC * b ?". Ch eX , e ? Ute ™^,H^"- n ^^? ,! '.• n 5 Buc ^Sa«. W or more arithmetic/logic units for 
executing arithmWand lo&afin^^ ^?? ; ?n°lir^5 0 ^^^ point 
insmictioris. instruction' '^^^^d^f^u/^ne^ i to an execute unit ,448 or load/store unit 450 
' which is configured to execute that instruction. ■ ' ' 

Load/store unit 450 provides an interface between execute units 448 and date,.cache 444. Load and 

'^f^^'^f^y 4 r9 to "che/444. . Additionally, memory 

' dependencies benveen load amfstore memory operations are detected and handled by load/store unit 450 

' Execute units 448 and Wad/store unit(s) 450 may include one or more reservation stations'for storing 
M&&^-v^' x i,fa^^ not yet been provided. An instruction is "selected from those stored in the 
reservation stations for execution if: (I) the operands, of the rarruction have been provided, and (2) the 
instructions' which are prior'tojne induction being selec^have not^'^r^ operands. It is noted that a 
cemraiized'rSse^ be included instead pf septate, reservations, stations. The centralized 

reservation' station ii'&uplea between instruction decc^eWiit 402, execute units 448, and load/store unit 450. 
-Such an embtriimerfmay perform the dispatch function withfothe centralized reservation station. 
J CPU 102 preferably supports out .of order execution ai}d employs reorder buffer 452 for storing 
•execution res^^fspicuhitiveh} executed instructions and storing these results' into register file 454 in program 
order, for perfonnmg dependency cfieckmg and register renaming, and for providing for. mispredicted branch 
and exception recovery. When an instruction is decoded by Infraction decode unit 402, requests for register 
operands are coHV«yed to reorder buffer 452 and register file ^ f n resporae to the register operand requests, 
one- of three* values is transferred to the execute ^AAZ^mM^ unit 450 which receives the 
instruction:- (i) the'value stored in reorder; buffer 452, ifWy^ue lias been speculatively generated; (2) a tag 
identifying' a iocation Within reorder buffer' 452 which will store the result, if the value has not been speculatively 
generated; or (3) the value stored in the register within register file 454, if no instructions within reorder buffer 
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452 modify the register. Additionally, a storage location within reorder buffer 452 is allocated for storing the 
results of the instruction being decoded by instruction decode unit 402. The storage location is identified by a 
• tag, which is conveyed to 'the unit receiving the instruction. It is noted that, if more than one reorder buffer 
storage location is allocated for storing results corresponding to a particular register, th e value or tag 
corresponding to the last result in program order is conveyed in response to a register operand request for tha, 
particular register. 

; When execute units 448 or load/store ^ 450. execute an inaction, the tag assigned to the instruction 

by reorder buffer 452 is conveyed upon result bus 458 along with the result of the insmiction. Reorder buffer 
452 stores the result in the indicated storage location. Additionally, execute units .448 and load/store unit 450 
compare the tags conveyed upon result bus 458 with tags of operands for instructions stored therein. If a match 
occurs, the unit captures the result from result bus 458 and stores it with the corresponding instruction. In mis 
manner, an instruction may receive the operands it is intended to operate upon. Capturing results from result bus 
458 for use by instructions is referred to as "result forwarding". 

Insmiction results are stored into register file 454 by reorder buffer 452 in program order. Storing the 
results of an instruction and deleting the instrliction from reorder buffer 452 is referred to as "retiring" the 
insm,ction. By retiring the instructions^ program order, recovery from incorrect speculative execution may be 
performed: For example; if an instruction is subsequent to a branch instruction whose taken/not taken prediction 
is ^rrect, mentor WT,en a mispredicted branch instruction or an 

instruction which causes an exception is ducted, reorder J buffer 452discards the instructions subsequent to the 
mispredicted branch instructions. Insm.«ions thus discarded are also flushed from execute units 448, load/store 
unit 450, and instruction decode unit 402. 

Register file 454 includes storage locationsfor ^ register defined by me m^ 
employed by microprocessor 102. For example, in the preferred embodiment where the CPU 102 includes an 
x86 microprocessor arehite^tare, the register file 454 includes locations for storing,the EAX, EBX, ECX, EDX, 
' ESI, Ebl, ESP, and EBP register values. 

^"r^ cache 444 is a high speed cache memc^ configured to store data** be,operated upon by 
mi^oprocessor 102. It is noted that data cache 444 m^ be configured, nto. a set.associative or direct-mapped 
configuration. '" 

Figure 5 - Instruction Decode I l n jt 

... now to one embodiment of instruction decode unit 402 is shown. Instruction decode 

unit 402 included an instruction alignment unit 460, a plurality of decoder; circuits 462,and a DSP function 
preprocessor 204. Instruction alignment unit 460 ; is coupled to receive instructions fetched from instruction 
cache 262 and aligns instructions to decoder circuits 462. 

- Instruction alignment unit 260 routes instructions to- decoder circuits 462. In one embodiment, 
instruction alignment unit 260 includes a byte queue in which instruction bytes fetched from instruction cache 
202 are queued. Instruction alignment unit 460 locates valid instructions, from within the byte queue and 
dispatches the instructions to respective decoder circuits 462. In another- embodiment, instruction cache 202 
includes predecode circuitry which predecodes instruction bytes as they are stored into insmiction cache 202. 
Start and end byte information indicative of the begmning and end of instructions is generated and stored within 
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instruction cache 202. > The predecode data is transferred to instruction alignment unit 460 along with the 
instructions, and instruction alignment unit 460 transfers instructions to the decoder circuits 462 according to the 
predecode information! 

The function preprocessor 204 is also coupled to .the instruct 202 and operates to detect 

instruction sequences in the instruction cache 202 which perform DSP instructions. Decoder circuits 462 and 
function preprocessor 204 receive X86 instructions from Ae instruction alignment unit 460. The function 
preprocessor 204 provides an uistruction disable signal upon a DSP bus Jo each of the decoder units 462. 

Each decoder circuit A6i decodes the inaction received from instruction alignment unit 460 to 
determine the register operands manipulated by the instruction as well as the unit to receive the instruction. An 
indication of the unit to receive the instruction as well as the instruction itself are conveyed upon a plurality of 
dispatch buses 468 to Execute iiriiu 448 and loaoVstore unit 450. Other buses, not shown, are used to request 
register operands from reorder buffer 452 and register file 454. 

The function preprocessor analyzes streams or sequences of X86 instructions from the instruction cache 
202 and determines 1 if a DSP function if being executed. -If so, the function preprocessor 204 maps the X86 
15 1 instruction strearri to a DSP macro and zero of more parameters and provides this information to one of the one 
or more DSP units 1 214. In one ^ embodiment, when sequence reaches the decoder 
circuits 462, the function preprocessor 264 : asserts" a dibble signal to each of Ae decoders 462 to disable 
operation of the 1 decoders 462 for the detected insmiction sequence. When a decoder circuit 462 detects the 
disable signal from 462 discontinues decoding operations until the 

disable sigjiaf is released After ^ t^e DSP function has exited the 

instruction cache 202, the function preprocessor 204 removes the disable signal to each of the decoders 462. In 
other words, once the Action prep delects Wend of the X86 instruction sequence, the function 

preprocessor 204 removes the disible signal to each of the decodm 462, and the d operation. 

- Each of decoder circuits 462 is configured to convey an instruction upon one of dispatch buses 468, 
along with an indication of the unit or units to receive the nttfcta* In. one embwiiment, a bit is included 
within the ^ indication for each oif execute units 448 and load/store unit 450. IV a particular bit is set, the 
corresponding unit is to eXediite ihe instruction' If a particular instruction is to be" executed by more than one 
unit, more than one bit in the indication may be set. 
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30 Function Preprocessor r " * ^ 

' • Referring now to Figure 6, a block diagram of the function preprocessor 204 is shown according to one 

V embodiment of the invention. As shown/ in this ernlbodiment the function preprocessor 204 comprises a scan- 
ahead circuit 502 for examining or scanning sequences olf instructions in the instruction memory or instruction 
cache 202. In one embodiment, the scin-ahead circuit oj means 502 examines sequences of instructions stored 
hrthe instruction Memory 202 prior to operation of the instruction decoder 40*2 in decoding the instructions 
comprising the respective sequence of instructions being icaruied. Thus 'the scan-ahead circuit 502 looks ahead 
at instruction sequences in the 1 instruction cache' 2*02 before Ae" respective insertions are provided to the 
instruction decoder 402. n ~ * y 
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The function preprocessor 204 further comprises an instruction sequence determination circuit 504 for 
determining whether a sequence of instructions in the instruction memory 202 implements a digital signal 
processing function. This determination can be performed in various ways, as described further below. 

The f^--p«pro«ssor 204 ^ con^rfces' . conversion /.napping circuit 506 for converting . 
sequence of tnstructions in the instruction memory 202 which implements a digital signal processing function 
into a digital signal processing function identifier or macro identifier and zero or more parameters Thus if the 
instruction sequence ^ determination circuit 504 detennines that a sequence of instructions in the msnuction 
memory 202 implements an FFT function, the conversion / mapping. circuit 506 converts mis sequence of 
instructions into a FFT function identifier and zero or more parameters. 

Figure 7 - Pattern Reco gnition Ci'rcuj f 

Referring now to Figure 7. in one embodiment the. function' preprocessor 204 includes a pattern 
recognmon circuit or pattern recognitipn detector 512 which detennines whether a sequence of instructions in 
the instruction memory 202 implements a digital signalprocessing function. The pattern recognition circuit 512 
stores a plurality of patterns of instruction sequences which implement digital signal processing functions. The 
pattern recognition circuit 512 stores bit patterns which corre S DO Jl d.to...c i H S ode sequences of machine language 
.nstructions which perform DSP functions, such as.FFTs. inner products, matrix manipulation, correlation, 
convolution, etc. 

' The pattern recognition detector ; 5.1^ instructions stored in the instruction 

memory 202 and comparers the sequence of insuyctipns. w^.the plurality of stored patterns. Operation of the 
pattern recognition detector.512 is shown in Figure. 8, in one embodiment, the partem recognition detector 512 
compares each of the. patterns w|th an instruction sequence ^ periodic Jocations in the inaction sequence. 
A ^T^^ fT^^^^^ ^ 2 ^Pares each,of ; the patterns.with ah, instruction sequence at 
predefined locations in the instruction sequence. The pattern recognition detector 512 : may include a look-up 
table as the unh wtdch p^onns the pattern comparisons, aspired.. .The, pattern, reeo^ition <iete«or 5 12 nury 
also perform macro prediction on instruction sequences to improve performance. 

The pattern recognition detector 5 12 detennines whether the.sequence of instructions in the instruction 
memory 202 substantially matches one of the plurality of stored patterns. A substantial match indicates that the 
sequence of insttuctions implements, a digital signal processing function. In the preferred embodiment, a 
substantial match occurs where,.the instruction sequence matches a stored pattern, by greater than 90%. Other 
matehing thresholds, such as 95% ; or .00%. may be used, as desired, If amatch occurs, the pattern recognition 
detector 512 detennines the type of DSPfunction pattern which.matched the sequence of instructions and passes 
this DSP function type to the conversion / mapping circui| 506. 

Fl g ure9.l^v>l f .;, p f^| r ' ' c " "J"' '[ \' 

Inferring now to Figure 9. in another embodiment the function preprocessor 204 includes a look-up 
table 514 which determines whether a sequence of instructions in the instruction memory 202 implements a 
d.gttaj signal processing function. In this embodiment, the lookup table 514 may be.in addition ,0. or instead 
of, the pattern recognition detector 5 12. 
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In an embodiment where the function preprocessor 204 includes only the look-up table 5 1 4, the look-up 
table 5 14 stores a plurality of patterns wherein each of the patterns is at least a subset of an instruction sequence 
which implements a digital signal processing function. Thus, this embodiment is^ similar to the embodiment of 
Figure 6 described above, except that the function preprocessor 204 includes the look-up table 514 instead of the 
5 pattern recognition detector 512 for detecting instructioii sequences which implement DSP functions. In 
addition, in this embodiment, the iobk-iip table 514 stores smaller patterns which correspond to smaller 
sequences of instructions, i.c, subsets of insmiction sequences, which implement DSP functionality. In this 
embodiment, the look-up table 5 1 4 requires an exact match with a corresponding sequence of instructions. If an 
exact match does not occur, then the sequence of instructions are passed to the one or more general purpose 
10 execution units, i.e., the general purpose CPU core, for execution. 

Figure 10 illustrates operation of the look-up"tabIe r 514 in this embodiment/ As shown, a sequence of 
instructions in the instruction cache 202 are temporarily stored in the instruction latch 542. The contents of the 
instruction latch 542 are then compared with each of me entries'tn the look-up table 5 14 by element 546. If the 
contents of the instruction latch 542 1 exactly match one of the entries in the look-up- table 514, then the DSP 
1 5 ; function or uistrucrion 548 which rormpoh^sVthis entry is provided to the DSP execution unit 214 

In the above embodiments of Figures 7 and" 9, the pattern recognition detector 5 1 2 and/or the look-up 
table 514-are configured £''det^Ue r 'tha^'aii instruction sequence'impiements a DSP function only when the 
determination can be made with relative certainty. This is because a "missed* instruction sequence, i.e., an 
instruction sequence wKtch^nlpie^ as implementing a DSP 

20 function, will not affect option of ^ 

the instruction ^equfehcer However,' ah insmiction sequence which does not implement a DSP function that is 
mis^identifTeiJ as a Isequerice^hich does' implement a DSP function is more iproblematic ? and could result in 
^ T: 1 r possible erroneous operation.' " lifts iVis c anticipated that the pattern recognition detector 5 12 or the look-up table 
- ; ' SHrtiky not accurately detect ev>;ry iiwuiictiori sequence which Implements^a DSP function. \n this instance, the 
25 - "qristructidn sequence is passeo" drfto onif of the general purpose execution units, as occurs in the prior art. 



■ :>} a Figure 1 1 * Pattern ■Recognition Circuit : with Look-ub T&\e ' * 
:i. ; -*n. Referring now to Figiffe 11? in another embodiment m'e includes both the 

^ » • - look-up table 514 and *the pattern Recognition detector 5i2^In ihis embodiment, the'function decoder 204 uses 
30 each ofthe look-up table 514 and me fe&gnition detectoV 5 12 to determme^whe^ sequence of instructions in 
the instruction 1 memory 202 implement^ digital signal proc^ing'function." This embodiment preferably uses a 
two stage analysis < of i sequence of X86 uistructibns; whereby the look-up table 514 first determines if the 
sequence likely implements a DSP ftmcdOn; and then tfie pattern" recognition detector 5 12 determines the type of 
DSP function being implemented. Alternatively, the pattern recognition detector 512 first determines if the 
35 sequence likely implements a DSP function, and then the look-up table ^^deu^meS'the tyj>e of DSP function 
being implemented * :-r 1 • • * x * • 

In mis embodiment, the look-up table 514 stored small patterns which * correspond to atomic DSP 
~ . instructions; For example, the look-up tabled 14 stores a pattern" bV*SiS Instructions which perform a multiply 
accumulate add function, which is common in DSP architectures. The look-up able 514 also stores other 
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panems which implement atomic DSP instructions. The pattern recognition detector 512 stores patterns 
corresponding to entire DSP functions, such as an FFT, a correlation, and a convolution, among others. 

First, the look-up table 514 compares each entry with incoming instruction sequences and stores the 
number of "hits" or matches for a sequence. If the number of matches is greater man a certain defined threshold, 
then the sequence includes a number of DSP-type "instructions" and thus is presumed to implement a DSP 
funcaon. In this instance, the patten, recognition detector, 512 is enabled to compare the entire sequence with 
each of the stored p*tems to determine the type of DSP function being implemented by the X86 instruction 
sequence. As mentioned above, the.pattern recognition detector 5 12 . determines if the instruction sequence 
substantially matches one of the stored patterns. 

. Conclusion 

Therefore, die present invention comprises a. novel . CPU or microprocessor archite«ure which 
optunizes execution of DSP and/or mathematical operations whifc^ compatibility with 

existing software. 

Although the system and method of the present invention has been described in connection with the 
preferred embodiment, it is not intended tp> limited to,the specific form set ibrth herein, but on the contrary it 
is intended to. cover such alternatives, modifications, and equivalents, as can be reasonably inciuded within *e 
spirit and scope of the invention as defined by the appended claims. 
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Claims 1 ' *'* " t 

' 1 . A central processing unit which performs digital signal processing functions, comprising: 
an instruction memory for storing a plurality of irtstmctions, wherein said instruction memory stores one 
or more sequences of instructions which are" intended to pekorrh a digitai signal processing function; 

a function preprocessor coupled to the instruction memory, wherein the function preprocessor includes: 
■ — 1 - means for examiriing said one or more sequences of instructions stored in said instruction 

memory; • ■ -'" * ; *- :C * - > 

means for determining whether a sequence of said instructions in said instruction memory is 
intended to perform a digital signal processing function; 

means for converting a sequence of said instructions in said instruction memory which is 
intended to perform a digital signal processing function into a digital signal processing function identifier, 

at least one general purpose processing core coupled to the function preprocessor for executing 
instructions in said instruction memory; v ' ' ~ * ; " '* - cr ~ 1 

at least one digital signal processing core coupled to the function preprocessor for performing digital 
signal processing fiinctioris. Wherein the at least one digital signal processing core receives said digital signal 
processing function identifier and performs a 1 digital signal processing function iir response to said received 
digital signal processing function identifier from said function prepiw " ' ^ 

2. The central processing unit of claim 1, wherein said instruction memory stores a first sequence 
of instructions which does not perform a digital signal processing function, and wherein said instruction memory 
stores a second sequence of instructions which performs a digital signal processing function; 

wherein said at least one general purpose processing core executes said first sequence of instructions; 

wherein said at least one digital signal processing core performs said digital signal processing function 
in response to said received digital signal processing function identifier, wherein said digital signal-processing 
function performed by said digital signal processing core is substantially equivalent to execution of said second 
sequence of instructions. 

3. The central processing unit of claim 1, wherein said at least one digital signal processing core 
provides data and timing signals to said at least one general purpose processing core. 

4. The central processing unit of claim l t wherein said function preprocessor generates a digital 
signal processing function identifier and one or more parameters in response to said determining means 
determining that said sequence of instructions in said instruction memory is intended to perform a digital signal 
processing function. 

5. The central processing unit of claim 1, wherein said at least one general purpose processing 
core is compatible with the X86 family of microprocessors. 

6. The central processing unit of claim 5, wherein said plurality of instructions are X86 opcodes. 

« 
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7. The central processing unit of claim 1. wherein said at least one digital signal processing core 
it adapted for performing one or more mathematical operations from the group consisting of convolution, 
correlation. Fast Fourier Transforms, and inner product. 

8. The central processing unit of. claim 1. wherein said at least one general purpose processing 
core and said at least one digital signal processing core operate substantially in parallel. 

9. A method for executing instructions in a central processing unit (CPU), wherein the CPU 
includes at leas, one general purpose CPU core and at least one digital signal processing (DSP) core, the method 

comprising: 

storing one or more sequences of instructions in an instruction memory for execution by the central 
processing unit; 

examining a sequence of instruction* in said insnw?ioa.rneinory; 

determining whether said sequence of instructions in said instruction memory is intended to perform a 
digital signal processing function; 

converting said sequence of instructions in said instruction memory which is intended to perform a 
digital signal processing function imo a digital signal processing function identifier; 

the digital signal processing core recejying said ; digital signal processing function identifier, 

the digital signal processing core performing a digital signal processing function in response to said 
received digital signal processing function identifier. 



1 0. The method of claim 9. further comprising: 

wherein said storing comprises storing a first sequence of instructions in said instruction memory which 
25 performs a first digital signal processing function; 

wherein said storing comprises storing a second sequence of instructions in said instruction memory 
which does not perform a digital signal processing function; 

wherein said converting converts said first sequence of instructions in said instruction memory which is 
intended to perform said first digital signal processing function into a first digital signal processing function 
30 identifier; 

wherein said performing comprises said digital signal processing core performing said first digital 
signal processing function in response to said first digital signal processing function identifier, wherein said 
performing said first digital signal processing function is substantially equivalent to execution of said first 
sequence of instructions; and 

said general purpose central processing unit core executing said second sequence of instructions. 
11. The method of claim 1 0, further comprising: 

said digital signal processing core and said general purpose central processing unit core operating 
substantially in parallel. 
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12. - - * The method of claim 1 0, further comprising: 

said digital signal processing core providing data andjiming signals to said general purpose central 



processing unit core. 



r *I3. The method of claim 9/furtter comprising: 

said function preprocessor^ generating a digital signal processing function identifier and one or more 
parameters in response to said determining that said sequence of instructions in said instruction memory is 
intended to perform a digital sigial pf o^sSing ttrndion.' * 



14. The method of claim 9, wherein said general purpose central processing 



unit core is 



coi 



:ors. 



15. The method of clafih 14. wherein said one or more sequences of instructions comprise X86 
15 : opcodes. * r * <' : - : " • * 

16. The method of claun 9; : - i 

said digital st^al-pi^ssm^ co're i 'i>erf6immg , ohe' or more mathematical* operations fron£the group 
" ■ ■ «. consisting of corivolutioriTcbrrela^ f ransfonr*; and inner product. ' ' V 



• * . -y: 
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